Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

نویسندگان

  • Kang Liu
  • Jörn Ostermann
چکیده

Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as eLearning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Automatic pruning of unit selection speech databases for synthesis without loss of naturalness

In the paper we present our experiments with automatic pruning of speech databases created by us for Unit Selection based speech synthesis systems. Several algorithms have been attempted and perceptually evaluated. An optimal size of speech database has been reached where lose of naturalness due to unit pruning is not perceptible.

متن کامل

Perceptually-based Data-driven Join Co

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...

متن کامل

Perceptually-based data-driven join costs: comparing join types

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...

متن کامل

Prosody-based Naturalness Improvement in Thai Unit-selection Speech Synthesis

This paper presents naturalness improvement in Thai unit-selection text-to-speech synthesis (TTS) based on prosody modeling. Although several modeling approaches of prosodic parameters in Thai speech have been proposed, they have not been proven to provide a promising performance when practically assembling in a synthesizer. In this paper, two learning machines for phrase break and phoneme dura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009